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Abstract 

Primal-dual  interior-point  path-following  methods  for  semidefinite 
programming  (SDP)  are  considered.  Several  variants  are  discussed, 
based  on  Newton’s  method  applied  to  three  equations:  primal  feasibil¬ 
ity,  dual  feasibility,  and  some  form  of  centering  condition.  The  focus  is 
on  three  such  algorithms,  called  respectively  the  XZ ,  XZ  +  ZX  and 
Q  methods.  For  the  XZ  +  ZX  and  Q  algorithms,  the  Newton  system 
is  well-defined  and  its  Jabobian  is  nonsingular  at  the  solution,  under 
nondegeneracy  assumptions.  The  associated  Schur  complement  matrix 
has  an  unbounded  condition  number  on  the  central  path,  under  the  non¬ 
degeneracy  assumptions  and  an  additional  rank  assumption.  Practical 
aspects  are  discussed,  including  Mehrotra  predictor-corrector  variants 
and  issues  of  numerical  stability.  Compared  to  the  other  methods  con¬ 
sidered,  the  XZ  \  ZX  method  is  more  robust  with  respect  to  its  ability 
to  step  close  to  the  boundary,  converges  more  rapidly,  and  achieves 
higher  accuracy. 
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1  Introduction 


Let  Sn  denote  the  vector  space  of  real  symmetric  nxn  matrices.  Denote  the 
dimension  of  this  space  by 


2  n(n  +  1) 

nz  =  — - 

2 

The  standard  inner  product  on  Sn  is 

A  •  B  =  tr  AB  =  AijBij. 

kj 

By  X  y  0  {X  y  0),  where  X  6  Sn,  we  mean  that  X  is  positive  semidefinite 
(positive  definite) . 

Consider  the  semidefinite  program  (SDP) 
minxes"  C  •  X 

s.t.  Ak  •  X  —  bk,  k  =  1, . . . ,  m  (2) 

1^0, 


where  b  €  Km,  C  €  Sn ,  and  Ak  €  Sn ,  k  =  1,  .. . m.  The  dual  SDP  is 
maxj/gStni2e5n  bTy 

s.t.  Er=  i  ykAk  +  z  =  c,  (3) 

z  s-  0. 


The  following  are  assumed  to  hold  throughout  the  paper. 

Assumption  1.  There  exists  a  primal  feasible  point  X  y  0,  and  a  dual 
feasible  point  ( y ,  Z)  with  Z  y  0. 

Assumption  2.  The  matrices  Aj.,  k  =  1, . . .,  m,  are  linearly  independent, 
i.e.  they  span  an  m-dimensional  linear  space  in  Sn. 

The  central  path  consists  of  points  (XM,  yu,  Z11)  6  Sn  X  Rm  X  Sn  satisfying 
the  primal  and  dual  feasibility  constraints  as  well  as  the  centering  condition 

X^ZV  =  yl  (4) 

for  some  /<  £  K,  g  >  0.  It  is  well  known  [NN94]  that,  under  Assumptions 
1  and  2,  ( X ,L ,  y'1.  Z,J)  exists  and  is  unique  for  all  y  >  0,  and  that 

(A,  y,  Z)  =  lim  (A'/;,  //,  2?) 

/Li— >0 


(•5) 
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exists  and  solves  the  primal  and  dual  SDP’s.  Furthermore,  because  X M  and 
Zu  commute,  there  exists  an  orthogonal  matrix  Q M  such  that 

=  Q»  Diag(Af ,  DiagK,  ...,«£)  (Q^)T, 

(6) 

where  the  A^  and  uj^,  respectively  the  eigenvalues  of  X M  and  Z^,  satisfy 

Afwf  =  g,  i  =  1, . . .,  n.  (7) 

Without  loss  of  generality,  assume  that 

K>-">K  and  wf  <  •  •  •  <  w«.  (8) 

As  /i  — >  0,  the  centering  condition  (4)  reduces  to  the  complementarity  condi¬ 
tion  XZ  —  0,  implying  that 

X  =  Q  Diag(Ai, . . . ,  A„)  QT ,  Z  -  Q  Diag(uq,  . . . ,  un)  QT ,  (9) 

for  some  orthogonal  matrix  Q,  with  the  eigenvalue  complementarity  condition 
A iU>i  =  0,7=  1 , . . . ,  n.  Observe  that  Aj  and  u>i  are  the  limits  of  A ^  and  as 
g  — >  0,  and  Q  may  be  taken  to  be  a  limit  point  (not  necessarily  unique)  of 
the  set  {Q11  :  ji  >  0}.  We  have 


Ai  >  •  •  •  >  A„  and  uq  <  •  *  •  <  un.  (10) 

Interior  point  methods  for  semidefinite  programming  were  originally  in¬ 
troduced  by  [NN94,  Ali9 1] .  Early  papers  on  primal-dual  methods  include 
[VB95]  and  [HRVW96].  A  preliminary  version  of  the  present  work  appeared 
as  [AH094b].  Convergence  analysis  of  primal-dual  path-following  methods 
for  SDP  appeared  first  in  [KSH94,  NT94,  NT95].  We  are  primarily  con¬ 
cerned  with  four  methods,  which  we  call  respectively  the  XZ ,  XZ  +  ZX , 
Nesterov-Todd  (NT)  and  Q  methods.  The  XZ  method  first  appeared  in 
[HRVW96,  KSH94].  The  XZ  +  ZX  method  was  introduced  in  [AH094b] 
and  was  recently  analyzed  by  [KSS96].  The  NT  method  was  given  by  [NT94, 
NT95]  and  its  implementation  was  recently  discussed  in  [TTT96].  The  Q 
method  originally  appeared  in  [AH094a].  Many  other  papers  on  semidefinite 
programming  have  recently  been  announced. 

The  paper  is  organized  as  follows.  In  Section  2  we  introduce  several  al¬ 
gorithms  in  a  common  framework  based  on  Newton’s  method,  focusing  on 
the  XZ  and  XZ  +  ZX  variants.  In  Section  3  we  study  the  Jacobian  of  the 
Newton  system  for  the  various  methods  under  nondegeneracy  assumptions, 
and  discuss  implications  for  local  convergence  rates.  In  Section  4  we  consider 
the  conditioning  of  the  Schur  complement  matrix  on  the  central  path,  again 


Semidefinite  Programming 


4 


under  nondegeneracy  assumptions.  This  leads  to  the  issue  of  numerical  sta¬ 
bility,  discussed  in  Section  5.  We  introduce  the  Q  method  in  Section  6.  In 
Section  7,  we  present  computational  results. 

Our  main  focus  is  on  the  nondegenerate  case;  this  assumption  (defined  in 
Section  3)  implies  unique  primal  and  dual  solutions.  We  take  the  view  that  it 
is  important  to  understand  how  methods  behave  on  nondegenerate  problems. 
This  does  not  discount  the  significance  of  degenerate  problems  that  may  arise 
in  applications,  as  is  common  in  linear  programming  (LP). 

A  word  about  notation:  we  use  the  symbols  X ,  y  and  Z  to  mean  several 
things.  Depending  on  the  context,  they  may  refer  to  the  variables  of  the  SDP, 
the  iterates  generated  by  a  method,  or  a  solution  of  the  SDP. 


2  The  Methods  in  a  General  Framework 


We  consider  only  primal-dual  interior-point  path-following  methods,  generat¬ 
ing  a  sequence  of  iterates  approximating  the  central  path  and  converging  to 
the  primal  and  dual  solutions.  See  [Wri96]  for  a  detailed  discussion  of  such 
methods  for  LP.  In  LP,  the  basic  iterative  step  can  be  readily  derived  using 
Newton’s  method.  For  SDP,  points  on  the  central  path  satisfy  the  nonlinear 
equation 

rEr=i  VkAk  +  Z-Cl 
Ai  •  X  —  bi 


d  m  •  X  bm 
XZ  -  gl 


(11) 


However,  the  matrix  XZ  is  not  symmetric  in  general.  Consequently,  the 
domain  and  range  of  the  function  defined  by  the  left-hand  side  of  (11)  are  not 
the  same  spaces,  and  Newton’s  method  is  not  directly  applicable.  For  LP,  on 
the  other  hand,  the  standard  primal-dual  interior-point  method  is  obtained 
by  applying  Newton’s  method  to  (11).  In  this  case,  X  and  Z  are  diagonal, 
and  XZ  is  also  diagonal,  so  the  domain  and  range  of  (11)  reduce  to  R2n+m. 

A  key  question  in  formulating  primal-dual  interior-point  methods  for  SDP 
is  therefore:  how  should  one  appropriately  formulate  Newton’s  method?  We 
consider  here  two  possibilities.  Other  choices  are  discussed  at  the  end  of  this 
section. 


The  XZ  Method.  Use  the  centering  condition  (4)  directly  and  view  the 
left-hand  side  of  (11)  as  a  function  whose  domain  and  range  are  both 
U  =  R"x”  xRm  xKnX".  Then  Newton’s  method  is  well  defined,  though 
the  iterates  are  not  symmetric  matrices.  (Actually,  only  the  X  iterates 


Semidefinite  Programming 


5 


are  not  symmetric,  since  the  dual  feasibility  equation  forces  Z  to  be 
symmetric.)  The  X  iterates  can  then  be  explicitly  symmetrized  before 
continuing  with  the  next  iteration.  Consequently,  this  method  is  not 
strictly  a  Newton  method.  A  different  iteration  is  obtained  by  using 
ZX  =  fit  instead  of  (4). 

The  XZ  +  ZX  Method.  Rewrite  (4)  in  the  symmetric  form 

XZ  +  ZX  =  2gl.  (12) 

Substituting  (12)  for  (4)  in  (11)  gives  a  mapping  with  domain  and  range 
both  given  by  V  =  Sn  X  Rm  X  Sn.  Application  of  Newton’s  method  to 
(12)  leads  to  symmetric  matrix  iterates  X  and  Z. 

We  observe  that  (4)  and  (12)  are  equivalent  when  X  >  0  (or  Z  >  0). 
That  (4)  implies  (12)  is  immediate.  That  the  converse  holds  for  X  >  0  is 
seen  by  using  X  =  QAQT  to  reduce  (12)  to  A (QTZQ)  +  (QTZQ) A  =  2 fil, 
with  A  diagonal  and  nonnegative  and  QTQ  =  /.  The  entries  on  the  left-hand 
side  are  (A,-  +  A j){QZQT)ij,  and  so,  since  the  off-diagonal  entries  must  be 
zero,  either  \  —  Aj  —  0  or  ( QZQT)ij  =  0  when  i  ^  j.  Thus,  A (QT ZQ)  is 
diagonal,  and  (4)  holds. 

We  now  examine  the  steps  defined  by  these  methods  in  more  detail.  The 
Newton  step  for  the  XZ  method  satisfies  the  linear  equation 

X  AZ  +  A XZ  =  ///  -  XZ.  (13) 

2 

Let  nvec  map  KnXra  to  Kn  ,  stacking  the  columns  of  a  matrix  in  a  vector. 
Then  we  may  rewrite  (13)  in  the  form 

(1 1(g)  X)  nvec  (AZ)  +  (Z^-I)  nvec  (AX)  =  nvec  {gl  —  XZ).  (14) 

where  .§§  denotes  the  standard  Kronecker  product  (see  Appendix,  equation 

(•59))- 

To  discuss  the  XZ  +  ZX  method,  we  introduce  a  symmetric  version  of 
the  Kronecker  product.  The  Newton  correction  for  (12)  satisfies  the  linear 
equation 


XAZ+AZX  +  AXZ+Z  AX  =  2 gl  -  XZ  -  ZX ,  (15) 

where  AX  and  AZ  are  symmetric.  Let  svec  be  an  isometry  identifying 
Sn  with  Mn”,  so  that  K  •  L  =  svec  (A') T  svec  (L)  for  all  K.  !.  6  Sn  (see 
Appendix).  Then  (15)  can  be  written  as: 

(Z  ®  I)  svec  (AX)  +  (X  <§)  I)  svec  (AZ)  =  svec  (/</  —  —  [XZ+  ZX))  (16) 
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where  ©  denotes  the  symmetric  Kronecker  product  defined  in  the  Appendix 
(see  (62)). 

We  shall  now  describe  both  methods  in  a  common  framework.  Let  vec 
denote  either  nvec  or  svec,  depending  on  the  context.  Specifically,  vec  will 
mean  nvec  in  the  case  of  the  XZ  method  and  svec  otherwise.  The  inverse 
of  vec  is  denoted  by  mat .  We  shall  use  lower  case  letters  x  and  2  to  denote 
veclf  and  vec Z  respectively,  and  we  shall  use  Ax  and  Az  interchangeably 
with  vec  AX  and  vec  AZ,  to  be  defined  shortly. 

Let 

I"  ( vecAi)T  1 


A  = 


L( 


vec 


(17) 


and  define 

rp  =  b  —  Ax,  R,j  — •  C  Z  mat  A T y, 

and 

Rc  = 

f  fil-XZ 
\  fil  -  \{XZ  +  ZX) 

X  Z  method  1 

XZ  +  ZX  method  J 

(18) 

with 

rd  =  vec  Rd, 

rc  =  vec  Rc. 

Let 

G(x,  y,  z)  = 

'  ~rd~ 

-rp  . 

.  ~rc_ 

(19) 

Note  that  G  maps  U  to  U  in  the  case  of  the  XZ  method  and 
Application  of  one  step  of  Newton’s  method  to  G(x,  y,  z )  = 
system 

'  0  At  I 
A  0  0 

.E  0  F 


'Ax' 

'rd' 

Ay 

= 

rp 

_A  z_ 

.rc. 

V  to  V  otherwise. 
0  gives  the  linear 


(20) 


Here 


E  = 


Z®I 

z  ©  / 


XZ  method 
XZ  +  ZX  method 


and 


F  = 


I  ®X 
X  ©  / 


XZ  method 
XZ  +  ZX  method 


and  I  is  the  identity  matrix  of  appropriate  dimension  (I®  I  for  the  XZ  method 
and  /  ©  I  for  the  XZ  +  ZX  method).  We  denote  the  Jacobian  matrix  on  the 
left-hand  side  of  (20)  by  J. 
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Applying  block  Gauss  elimination,  (20)  reduces  to  the  system 


'  -F-1E  At  ’ 

Ax 

a- 

1 

1 

"S 

Cl 

_ i 

A  0 

Ay 

rp  I 

A  second  step  of  block  Gauss  elimination  gives 

MAj/  =  rp  +  AE-1  (F rd  -  rc) , 

Ax  =  — E_1  (F(r, d  -  AT Ay)  -  rc 
and,  from  the  dual  feasibility  equation, 

A z  =  rd-  AT Ay, 

where 

M  =  AE_1FAt. 


(21) 

(22) 

(23) 

(24) 

(25) 


We  call  M  the  Schur  complement  matrix.  The  main  computational  work  is 
the  formation  and  factorization  of  M.  The  kth  column  of  the  matrix  E-1FAr 


is 


nvec(X  AkZ  1) 
svec (Lk) 


XZ  method 
XZ  +  ZX  method 


where  Lk  is  the  solution  of  the  Lyapunov  equation  (see  Appendix) 


Z  Lk  +  L).Z  —  XAk  +  A  /,•  A .  (26) 

Formation  of  M  thus  requires  a  Cholesky  factorization  of  Z,  in  the  case  of 
the  XZ  method,  and  an  eigenvalue  factorization  of  Z,  in  the  case  of  the 
XZ  +  ZX  method.  If  m  n  and  we  neglect  sparsity  considerations,  the 
additional  cost  of  the  eigenvalue  factorization  is  negligible  in  comparison  to 
the  other  operations  required  to  form  and  factor  M. 

It  is  clear  that,  as  long  as  X  y  0  and  Z  y  0,  nonsingularity  of  the 
Jacobian  matrix  J  is  equivalent  to  nonsingularity  of  the  Schur  complement 
M.  In  the  case  of  the  XZ  method,  M  is  symmetric  and  positive  definite.  In 
the  case  of  the  XZ  +  ZX  method,  M  is  not  symmetric,  but  can  be  shown 
to  be  nonsingular  if  XZ  +  ZX  y  0  [SSK96].  Equation  (22)  is  solved  by 
using  a  Cholesky  factorization  of  M  in  the  case  of  the  XZ  method  and  an 
LU  factorization  of  M  in  the  case  of  the  XZ  +  ZX  method. 

For  the  XZ  +  ZX  method,  the  multiplications  by  E-1  in  (22)  and  (23) 
require  the  solution  of  Lyapunov  equations,  using  the  eigenvalues  of  Z  already 
computed  to  form  M. 

Both  methods  are  then  described  by  the  following: 


Semidefinite  Programming 


Basic  Iteration. 


1.  Choose  0  <  cr  <  1  and  define 


X*Z 

g  =  o - 

n 


(27) 


2.  Determine  AX,  Ay,  A Z  from  (20),  equivalently  (22)-(24). 

3.  In  the  case  of  the  XZ  method,  replace  AX  by  |(AX  +  AXT). 

4.  Choose  steplengths  a,  ft  and  update  the  iterates  by 

X  <-  X  +  a  AX 
V  y  +  ftAy 

Z  <-  Z  +  ft  AZ. 


Rules  for  defining  a  will  be  discussed  later.  A  simple  steplength  rule  is 
given  by  choosing  a  parameter  r,  0  <  r  <  1,  and  defining 


a  =  min(l,  r  d) 

d  =  supjct  :  X  +  a  AX  D  0} 

(28) 

ft  -  min(l,  t  ft) 

ft  =  sup{/l  :  Z  +  ft  AZ  y  0}. 

(29) 

Note  that,  except  in  the  case  AX  >  0,  we  have  0  <  a  <  oc  with 

Q_1  =  Amax(-AX,X), 

where  Amax(A,  B)  means  the  largest  eigenvalue  of  the  pencil  (A,  B ),  i.e.  the 
largest  root  of  det(AB  —  A)  =0. 

Other  methods  can  also  be  defined  in  the  same  framework;  two  of  these 
are  discussed  below.  See  [Zha95]  for  a  class  of  methods  that  includes  the 
XZ  +  ZX  method,  and  [KSH94,  SSK96]  for  another  class  that  includes  all 
those  discussed  here  except  the  XZ  +  ZX  method. 

The  X-1  Method.  Replace  R,  in  (19)  by  Rc  =  gX_1  —  Z,  so  E  =  gX_1  ® 
X-1,  F  =  I® I.  A  similar  method  can  be  defined  with  Rc  =  gZ-1  —  X. 
In  fact,  the  method  given  by  [VB95]  is  based  on  a  combination  of  these 
two  steps. 

The  Nest.erov-Todd  Method.  Use  Rc  =  gX_1  —  Z,  E  =  W~l  ®  I'D-1,  F  = 
I  ®  I,  where  ID  =  X1I2{X1/2ZX1I2)~1I2X1I2 .  This  form  does  not 
actually  arise  from  applying  Newton’s  method  to  (19).  However,  see 
[TTT96]  for  a  Newton  interpretation  of  this  method. 
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As  long  as  X  y  0  and  Z  y  0,  E_1F  is  symmetric  and  positive  definite  for 
both  these  methods.  However,  in  both  cases,  the  function  to  which  Newton’s 
method  is  applied  fails  to  exist  at  a  solution.  We  call  an  algorithm  a  Newton 
method  if  (AX,  Ay,  A Z)  is  derived  by  applying  Newton’s  method  to  a  func¬ 
tion  that  is  well  defined  for  all  X  P  0,  Z  P  0.  Under  this  definition,  of  the 
four  variants  defined  so  far,  only  XZ  +  ZX  is  a  Newton  method  for  SDP. 

In  the  special  case  that  SDP  is  an  LP,  the  XZ,  XZ  +  ZX  and  Nesterov- 
Todd  methods  coincide,  giving  the  XZ  method  for  LP,  which  is  a  Newton 
method. 

In  order  to  understand  the  asymptotic  behavior  of  Newton’s  method,  it  is 
important  to  analyze  the  Jacobian  at  the  solution  itself.  This  is  done  in  the 
next  section. 

3  The  Jacobian  at  the  Solution 

In  this  section  we  study  the  Jacobian  of  the  function  G,  appearing  on  the  left- 
hand  side  of  (20),  under  nondegeneracy  assumptions.  To  do  this,  we  use  the 
notions  of  nondegeneracy  that  were  introduced  by  the  authors  in  [AH095]. 

Definition  1.  Let  (X,  y,  Z)  solve  SDP,  with  an  orthogonal  matrix  Q  sat¬ 
isfying  (9).  Let  X  have  rank  r,  with  positive  eigenvalues  A|, . . .,  A,.,  and 
partition  Q  =  [Q\  Q 2],  where  the  columns  of  Q\  are  eigenvectors  correspond¬ 
ing  to  A 1 , . . . ,  Ar.  We  say  that  (X,  y,  Z)  satisfies  the  strict  complementarity 
and  primal  and  dual  nondegeneracy  conditions  if  the  following  hold: 

1.  rank(X)  =  n  —  r, 

2.  the  matrices 

Qi  QjAkQ2 

[  Q^AkQi  0 
are  linearly  independent  in  Sn ,  and 

3.  the  matrices 

QjAkQ  1,  for  k  —  1,2,...,  m, 

span  the  space  Sr . 

These  conditions  are  well  defined  even  if  Q  is  not  unique.  The  first  re¬ 
quirement  is  the  strict  complementarity  condition.  Conditions  (30),  (31)  are 
respectively  primal  and  dual  nondegeneracy  conditions  under  the  assumption 
of  strict  complementarity.  They  immediately  imply  the  inequalities 

r2  <  m  <  r2  +  r(n  —  r) 


(31) 


for  k  =  1,2, ...  ,m,  (30) 


(32) 


Semidefinite  Programming 


10 


(recalling  the  notation  (1)).  They  also  imply  uniqueness  of  the  primal  and 
dual  solutions.  Furthermore,  the  conditions  are  generic  properties  of  SDP, 
meaning  roughly  that  they  hold  with  probability  one  for  an  optimal  solution 
triple,  given  random  data  with  feasible  solutions.  For  motivation  of  these 
conditions,  and  further  details,  see  [A 1 1095]. 

The  strict  complementarity  condition  rank(X)  =  r,  rank(Z)  =  n  —  r 
implies,  using  (10),  that 

Ai  ^  Ar  >  Ar_|_i  =  . . .  =  \n  =  0,  (03) 

and 

0  =  u>\  —  ■  ■  ■  —  u)r  <  u)r+ 1  <  . . .  <  ion,  (34) 

Let  Bk  =  QT AkQ.  From  (62),  we  have 

svec Bk  =  (Q  ®Q)  svec  Ak. 

Recall  the  definition  (17),  and  define 

(svecBi)T 

B  =  :  ,  (35) 

.  (svec5m)T  . 

so  that 

A(Q  (?)  Q)  =  B. 

Each  column  of  B  corresponds  to  an  index  pair  (i,  j),  identifying  two 
columns  of  Q ,  with  1  <  i  <  j  <  n .  By  choosing  the  ordering  used  by  the 
svec  operator  appropriately,  we  may  write 

B  =  [Ci  C2  C3]  (36) 

where  Ci  contains  r2  columns  corresponding  to  1  <  i  <  j  <  n,  C2  contains 
r(n  —  r )  columns  corresponding  to  1  <  i  <  r,  r  +  1  <  j  <  n ,  and  C3 
consists  of  (n  —  r)2  columns  corresponding  to  r  +  1  <  i  <  j  <  n.  The 
primal  nondegeneracy  condition  (30)  holds  exactly  when  the  rows  of  [Ci  C2] 
are  linearly  independent,  i.e.  [Ci  C2]  has  rank  m.  The  dual  nondegeneracy 
condition  (31)  holds  exactly  when  Ci  has  rank  ?’2,  i.e.  the  columns  of  Ci  are 
linearly  independent.  Thus,  the  conditions  (30)  and  (31)  together  imply  that 
it  is  possible  to  choose  m  —  r2  columns  from  C2  so  that,  together  with  all  the 
columns  of  Ci,  they  form  a  nonsingular  m.  X  m.  matrix.  In  other  words,  we 
can  choose  an  ordering  for  the  columns  of  C2,  and  therefore  of  B,  so  that 

B  =  [Bi  B2]  (37) 


where  Bi  €  E"  x,,i  is  nonsingular. 
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Theorem  1  Consider  an  SDP  whose  solution  (X,  y ,  Z)  satisfies  the  strict 
complementarity  and  primal  and  dual  nondegeneracy  conditions.  Let  J  be 
the  Jacobian  of  the  function  G  defining  the  XZ  +  ZX  method,  evaluated  at 
( X,y,Z ).  Then  J  is  nonsingular. 


Proof:  We  have 


where  E  =  Z®  I  and  F  =  X  ©  I.  Let  P  =  Q  ®  Q,  and  let  S  =  Diag(P,  I,  P) , 
so  that 


'  0  Bt  I  " 

STJS  =  B  0  0 

T  0 

with  $  =  PrFP  and  T  =  PrEP.  Using  Lemma  2  (see  Appendix)  and  (9), 
we  see  that  PTP  =  I  and  $  and  T  are  diagonal  with  entries  A  +  A  j  and 
u)i  +  u>j,  1  <  i  <  j  <  n,  respectively.  Notice  that  the  diagonal  entry  of  $ 
corresponding  to  the  index  pair  (i,j)  is  zero  if  and  only  if  r  +  1  <  i  <  j  <  n 
(because  of  (33)),  while  the  diagonal  entry  of  T  corresponding  to  the  pair 
(/.  j)  is  zero  if  and  only  if  1  <  *  <  j  <  r  (see  (34)). 

Using  the  partitioning  of  B  in  (37),  we  have 

'0  0  Bf  I  O' 

0  0  Bf  0  I 

STJS  =  B!  B2  0  0  0  (38) 

Tt  0  0  0 

_  0  X2  0  0  $2_ 

where  T  =  Diag(Yi,  X2)  and  $  =  Diag(^i,  $2).  We  have  >-  0,  since 
none  of  the  columns  of  C3  are  included  in  Bi,  and  T2  >-  0,  since  all  of  the 
columns  of  Ci  are  included  in  Bi. 

Interchanging  the  first  and  third  rows  and  the  second  and  last  columns  of 
(38) ,  we  obtain 

Bi  0  0  0  B2 

0  I  Bf  0  0 

0  0  Bf  I  0  . 

Ti  0  0  $1  0 

_  0  $2  0  0  Y2_ 

We  shall  demonstrate  the  nonsingularity  of  this  matrix  using  block  Gauss 
elimination.  First,  subtract  Y1B]”1  times  the  first  block  row  from  the  fourth 
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block  row  to  eliminate  Yi  from  the  4,1  position.  This  does  not  otherwise 
change  the  lower  triangle  or  the  diagonal  blocks,  only  introducing  —  YiB]~1B2 
into  the  4,5  position.  Second,  subtract  $2  times  the  second  block  row  from  the 
fifth  row,  eliminating  $2  from  the  5,2  position;  this  introduces  —$2 B^  into 
the  5,3  position.  This  5,3  block  is  then  eliminated  by  adding  times 

the  third  row  to  the  fifth  row,  introducing  <t>2B^B]~r  into  the  5,4  position, 
giving 

Bi  0  0  0  B2 

0  i  bJ  0  0 

0  0  Bf  I  0 

0  0  0  $1  -YiH 

0  0  0  $2ht  y2 

where  H  =  B^”1B2.  In  order  to  show  that  this  matrix  is  nonsingular  we  need 
only  show  that  the  trailing  2  by  2  block  is  nonsingular,  or  equivalently  that 
its  positive  row  scaling 

I  -^YiH 

.YJ^H3,  I 

is  nonsingular.  A  final  step  of  block  Gauss  elimination  yields  a  block  upper 
triangular  matrix  with  last  diagonal  block  given  by 

1  +  Y^1$2Ht$^1Y1H. 

This  matrix  is  nonsingular,  since  it  is  of  the  form  I  +  N1N2  with  Ari  A  0  and 
A 2  y  0.  (The  product  of  two  symmetric  and  positive  semidefinite  matrices, 
though  nonsymmetric,  has  real  nonnegative  eigenvalues.) 

□ 

Corollary  1  Consider  an  SDP  whose  solution  ( X,y,Z )  satisfies  the  strict 
complementarity  and  primal  and  dual  nondegeneracy  conditions.  Suppose 
that  the  XZ  +  ZX  method  uses  o  =  0  and  a  —  (3  =  1  in  the  Basic  Iteration. 
Then,  there  exists  e  >  0  such  that,  if  the  iteration  is  started  at  (Xq,  yo,  Zq), 
with  ||(Xo,  t/oi  ■Z’o)  —  (X,  y,  X)  ||  <  e,  the  iterates  converge  Q-quadratically  to 
( A  •  y,  Z).  ' 

The  proof  of  Corollary  1  is  immediate  from  the  standard  convergence 
theory  for  Newton’s  method.  It  is  clear  that  Corollary  1  holds  also  for  less 
restrictive  assumptions  on  o,  a  and  /3.  See  [ZTD92]  for  relevant  results  for  LP. 
There  is  no  requirement  that  (Xo,  yo,  Zq)  lie  in  a  horn-shaped  neighborhood 
of  the  central  path,  or  even  in  the  feasible  region.  Note  that  the  assumptions 
of  Corollary  1  do  not  guarantee  positive  definite  iterates.  These  are  not 


Semidefinite  Programming 


13 


required  to  make  (20)  well-defined,  though  the  equivalence  of  (20)  with  (22)- 
(24)  does  not  hold  if  X  or  Z  is  singular.  In  practice,  conditions  (28)-(29) 
ensure  positive  definite  iterates. 

A  result  like  Theorem  1  does  not  hold  for  any  of  the  other  methods  dis¬ 
cussed  so  far.  As  already  noted,  the  function  to  which  Newton’s  method  is 
applied  is,  in  the  case  of  the  X~x  and  Nesterov-Todd  methods,  not  defined 
at  an  optimal  point.  For  the  XZ  method,  the  function  G  is  defined  at  the 
solution,  but  it  can  be  shown  that  the  Jacobian  J  is  always  singular  there. 
More  importantly,  bearing  in  mind  the  symmetrization  step,  an  example  can 
be  constructed  where  J  has  a  null  vector  (AX,  A y,  A Z)  with  AX  +  AAT  ^  0. 

It  is  well  known  that  a  result  like  Theorem  1  holds  for  the  XZ  method 
for  LP,  using  LP  nondegeneracy  assumptions. 

Nondegeneracy  assumptions  are  not  required  to  obtain  superlinear  con¬ 
vergence  results.  This  has  been  known  for  some  years  for  LP  [Wri96]  and 
is  the  subject  of  active  current  research  for  SDP.  However,  such  results  re¬ 
quire  that  the  iterates  of  a  method  stay  close  to  the  central  path.  Our  point 
here  is  that  classical  Newton  theory  applies  to  the  XZ  +  ZX  method,  under 
nondegeneracy  assumptions,  in  SDP  just  as  in  LP. 

4  Conditioning  of  the  Schur  Complement  Matrix 

In  this  section,  we  study  the  conditioning  of  the  Schur  complement  matrix 
M,  introduced  in  Section  2,  on  the  central  path.  It  is  important  to  note  that, 
when  started  on  the  central  path,  all  the  methods  discussed  so  far  generate 
the  same  first  iterate.  On  the  central  path,  X  and  Z  commute.  Therefore, 

E-1F  =  -X®X 

in  all  cases  except  the  XZ  method  for  which  we  have  E-1F  =  j^X  X .  In 
both  cases  the  Schur  complement  matrix  M  =  AE-1FAT  is  the  same. 

We  now  analyze  the  condition  number  of  M  on  the  central  path,  as  g  — >  0. 
We  begin  by  considering  its  rank  in  the  limit. 

Theorem  2  Assume  that  (XM,  y Z M)  lies  on  the  central  path  of  an  SDP 
whose  solution  ( X,y,Z )  =  lim^o  (X^^y^,  Z^)  satisfies  the  dual  nondegen¬ 
eracy  condition  (31),  with  r  =  rank(A).  Let  MT  be  the  Schur  complement 
matrix  defined  at  (X'fi  y Z v‘).  Then 

lim  (p  MP'j 

exists  and  has  rank  r2 . 
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Proof:  Clearly, 

H  1VR  N  =  A(X  ®  X)  At 

the  matrix  whose  (/,  k)  element  is  tr  (XAiXAk).  Let  Q  and  A,  satisfy  (9),  and 
write  A i  =  Diag(Ai, . .  Ar)  y  0,  with  corresponding  eigenvectors  collected 
in  Q\ .  so  that  X  =  QiA\Q\ .  Let  Ci  be  the  m  X  r2  matrix  introduced  in 
(36),  and  let  Di  be  the  r2  X  r2  diagonal  matrix 

Di  =  Diag(AjAj),  1  <  *  <  j  <  r, 

using  consistent  orderings  for  Ci  and  Di.  Then 

N  =  CiDiCf 

since  the  (/,  k)  element  of  the  right-hand  side  is 

tr  (A  1QjAiQ1A1QfAkQi)  =  tr  (XAtXAk). 

Since,  by  the  dual  nondegeneracy  assumption,  Ci  has  linearly  independent 
columns,  and  since  Di  y  0,  this  completes  the  proof  of  the  theorem.  □ 

Recall  that  the  condition  number  of  a  symmetric  positive  definite  matrix 
is  Kmax/Kmim  where  Kmax  and  Kmin  are  respectively  its  largest  and  its  smallest 
eigenvalues. 

Theorem  3  Suppose  that  the  assumptions  of  Theorem  2  hold.  Then,  if  m  > 
r2  >  0,  the  condition  number  o/MT  ( equivalently  of  p is  bounded  below 
by  a  positive  constant  times  1/ g. 

Proof:  Let  Q X1-  satisfy  (6),  and  let  B^,  be  the  matrices  introduced 
in  Section  3,  evaluated  at  (XM,  yC  Z^).  Using  Lemma  2  (see  Appendix),  we 
have 

//  MT  =  A{X^  ®  X*)  At  =  B'-D'^B'-)7  (39) 

where  D^'  is  the  diagonal  ri2  X  n2  matrix 

=  Diag(AfA^),  1  <i<j<n.  (40) 

The  primal  solution  rank  r  defines  a  splitting 

D^DiagfD*  D£,D£) 


consistent  with  (36) ,  so  that 

//  =  cfDf(cf)T  +  c'2d'2{c'.2)t  +  c|;d';(c'()7’. 


(41) 
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Here  the  entries  of  the  diagonal  matrices  D^,  and  Dg  are  with 

the  indices  1  <  i  <  j  <  r  for  D^,  \  <i<r<j<n  for  Di),  and 
r+l<?<j<n  for  Dg.  Although  Q^‘  and  do  not  generally  converge 
as  g  — »  0,  Theorem  2  shows  that  /iM'1  — »  N  =  C1D1Cf,  with  rank  r2.  By 
assumption,  m  >  r2  >  0,  so  the  largest  eigenvalue  of  N  is  positive  and  the 
smallest  is  zero.  The  norms  of  the  second  and  third  terms  in  (41)  are  0(g),  so 
the  largest  and  smallest  eigenvalues  of  g  iVP  are,  respectively,  bounded  away 
from  zero,  and  0(g).  (Here  we  use  the  fact  that  eigenvalues  of  a  symmetric 
matrix  are  Lipschitz  continuous  functions  of  the  matrix  entries.)  □ 

For  LP,  a  result  like  Theorem  3  does  not  hold.  On  the  contrary,  under 
assumptions  of  LP  nondegeneracy  and  strict  complementarity,  the  condition 
number  of  the  LP  Schur  complement  matrix  is  bounded  independent  of  g. 

5  Stability 

We  have  seen  in  the  previous  section  that,  for  nondegenerate  SDP’s,  the  con¬ 
dition  number  of  the  Schur  complement  matrix,  evaluated  on  the  central  path, 
is  bounded  below  by  a  positive  constant  times  1  f  g  (ruling  out  the  exceptional 
cases  r2  =  m  and  r  =  0).  Consequently,  one  expects  that  as  g  —>  0,  the  com¬ 
putation  of  Ay  in  (22)  will  become  increasingly  less  accurate.  Indeed,  in  our 
original  implementations  we  observed  numerical  instability  leading  to  signif¬ 
icant  loss  of  primal  feasibility  near  a  solution.  Recently,  however,  Todd,  Toh 
and  Tutiincii  [TTT96]  found  that  high  accuracy  is  achievable.  The  main  issue 
is  the  choice  of  formulas  for  Ay  and  AX.  Several  mathematically  equivalent 
choices  are  possible,  but  these  have  quite  different  stability  properties. 

Formulas  for  Ay  and  AX  are  given  in  (22)  and  (23).  Both  include  the 
term  Fr^  —  rc.  For  the  XZ  +  ZX  method,  this  term  (in  matrix  form)  is 

i  ({X(C-Z  -  mat  A  Ty)  +  (C  -Z  -  matATt/)X)-  (2  gl  -  XZ  -  ZX)) 

which  can  be  rewritten  as 

—  {^(X(C  —  mat  ATy)  +  (C  —  mat  A  Ty)X)  —  2  gl^j  . 

However,  using  this  simplification  to  modify  (22)  and  (23)  leads  to  instability 
and  loss  of  primal  feasibility.  It  is  much  better  to  implement  (22)  and  (23) 
directly.  This  is  done  in  the  computational  experiments  reported  in  Section  7. 

The  same  issue  applies  to  the  XZ  method.  However,  direct  implementa¬ 
tion  of  (22)  and  (23)  does  not  give  good  results  for  the  XZ  method.  Instead1, 

1The  discussion  here  is  motivated  by  [TTT96]. 
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we  use  the  fact  that  E  1F  is  symmetric  positive  definite  to  write 
E-1F  =  Z~x  <gs  X  =  GtG,  G  =  M-1  <g)  Lt, 


where  L  and  M  are  respectively  Cholesky  factors  of  X  and  Z.  i.e. 

x  =  llt,  z  =  mmt. 


Noting  that  the  first  block  in  the  right-hand  side  of  (21)  is 
u  =  vec  U  =  vec  (C  —  gX~l  -  mat  ATy), 


we  see  that  (21)  is  equivalent  to 


1 

Ei 

>< 

i— i 

1 

1 _ 

Ax 

u 

1 - 

o 

1 _ 

.  Ay  . 

.  rv  . 

Ax  =  G  T  Ax  =  vec  [L  1  AX  M)  =  vec  AX, 
u  =  G u  =  vec  (LtU M~t)  =  vec  U, 


and 

A  =  AGt 

The  solution  is  given  by 


'  (  vec/,r..1,.V/-r)r  ' 
(vec  LT  AmM~T)T 


(AAr)Ay  =  rv  +  A  u 


(which  may  be  solved  with  a  Cholesky  factorization)  and 

AX  =  LAXM-1  =  L(  mat  AT Ay  -  U)M~1. 


(42) 


(43) 


(44) 


(45) 


This  last  equation  can  be  written  in  many  ways,  three  of  which  are 

AX  =  !.  (//( mat  A1  Ay).\rT  //'  C.\rr’)  M~'  (46) 

=  LLT(matATAy)M-TM-1-LLTUM-TM~1  (47) 
=  LLt  (( mat  AT Ay)  -  U )  .\l~'r  M~' .  (48) 

Of  these  four  mathematically  equivalent  formulas,  (45)  and  (46)  give  the 
highest  accuracy,  with  smallest  loss  of  primal  feasibility.  We  used  (45)  in  our 
computational  experiments,  with  Ay  defined  by  (44). 

For  the  Nesterov- Todd  method,  E-1F  is  also  symmetric  positive  definite, 
so  similar  considerations  apply;  see  [TTT96]. 
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6  The  Q  Method 

In  this  section  we  change  direction,  deriving  an  alternative  primal-dual  interior- 
point  method  that  generates  iterates  ( X ,  y,  Z )  with  the  property  that  X  and 
Z  commute,  i.e.  XZ  =  ZX .  This  is  motivated  by  the  fact  that  this  property 
holds  for  all  points  on  the  central  path.  Instead  of  treating  the  variables  X 
and  Z  directly,  we  introduce  as  variables  the  eigenvalues  of  X  and  Z  and 
their  common  set  of  eigenvectors.  In  other  words,  the  variables  consist  of  an 
orthogonal  matrix  Q,  diagonal  matrices  A  and  Q  and  a  vector  y  G  Rn  that 
must  satisfy 

m 

QQQt  +  yk.Ak  =  C, 

k=  1 

Ak  •  ( QAQt )  =  bk ,  k  =  1, . .  . ,  m  (49) 

All  =  /if . 

This  defines  a  map  from  On  xR2n+m  to  where  On  is  the  Lie  group 

of  orthogonal  matrices  with  determinant  one,  whose  dimension  is  n ( n  —  1) /2. 
(Since  the  signs  of  eigenvectors  are  arbitrary,  it  is  not  a  restriction  to  impose 
det  Q  =  1.  )  The  price  paid  for  the  diagonalization  is  the  nonlinear  appearance 
of  the  variable  Q  in  the  feasibility  equations. 

Let  Kn  denote  the  space  ofnXH  skew-symmetric  matrices,  and  consider 
the  exponential  map  from  Kn  to  On  defined  by 

exp  (5)  =  I +  S+  ^,S'2  +  •  •  • 

This  map  is  smooth,  onto,  and  in  a  neighborhood  of  0,  also  one-to-one.  Bor¬ 
rowing  a  technique  used  by  [OW95],  we  derive  a  form  of  Newton’s  method 
based  on  parameterizing  On  near  a  given  point  Q  by  Q  exp(,S') .  Let  kvec 
be  an  isometry  from  Kn  to  K”(”-1)/2,  stacking  the  upper  triangular  entries 
of  a  skew-symmetric  matrix  in  a  vector,  with  a  factor  of  \/2  to  preserve  the 
inner  product.  Let  us  use  the  convention  s  =  kvec  (.S'),  A  =  Diag(A)  and 
Q  =  Diag(uj).  Define 

"  vec  (C  —  Q  exp(,S)fl  exp(— S)QT)  —  A T y~ 

Gq(X,  y,  u,  s)  =  6  -  A  vec(Q  exp(,S)Aexp(-,S')(5:r)  ,  (50) 

AQe  —  ge 

The  function  Gq  maps  Rn'+n+m  to  itself.  Note  that  the  third  component  of 
Gq  has  the  form  familiar  from  LP. 
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Given  an  iterate  (X,  y ,  Z)  —  (QAQT ,  y.  QQ,QT),  we  obtain  a  new  iterate 
by  applying  Newton’s  method  to  the  equation  Gq  =  0  at  the  point  (A,  y,  lj,  0). 
The  Newton  step  (AA,  Ay,  Au>,  s )  is  obtained  by  replacing  exp(,S')  by  I  +  S 
and  discarding  second-order  terms.  The  resulting  ( n 2  +  n  +  m)  X  ( n 2  +  n  +  m) 
linear  system  is: 


aq  +  sq  -  ns  +  AykBk  =  h-q  (51) 

k= 1 

Bk  •  (AA  +  SA  —  AS)  =  bk  -  Bk  •  A,  k  —  (52) 

AAO  +  OAA  =  gl-An  (53) 

where  Bk  =  QT AkQ  and  H  =  QTCQ  -  J2k=i  lJkBk. 

The  basic  iteration  for  the  Q  method  is  therefore: 

1.  Choose  0  <  cr  <  1  and  define 

A  tlj 

ji  =  <j - . 

n 

2.  Determine  (AA,  Ay,  Alo,  s)  from  (5 1)  (53) . 

3.  Choose  steplengths  a,  (3,  7  and  update  the  iterates  by 

A  f-  A  +  a  AA 

y  V  +  P  Ay 

n  4-  fi  +  p  aq 

Q  <r-  Qil+^SW-foS)-1. 

A  simple  steplength  rule  is  a  =  min(l,  rot),  f3  =  min(l,  r/3),  and  7  =  s/a/3, 
where  a  and  /3  are  steps  to  the  boundary  of  the  positive  orthant.  The  mul¬ 
tiplicative  factor  updating  Q  is  the  Cayley  transform,  an  easily  computed 
orthogonal  matrix  that  approximates  the  matrix  exponential  to  second  order. 

The  equations  defining  the  Q  method  can  be  rewritten  as  follows.  First 
note  that  (52)  can  be  rewritten  as 

Bk  •  AA  +  tr  ((A Bk  -  BkA)S )  =  bk  -  Bk  •  A 


and  write 


v  = 


61  —  B\  •  A 

bm  Bm  •  A 
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Let  diag(Rj-)  be  the  vector  consisting  of  the  n  diagonal  entries  of  B k  and 
offdiag(i?A-)  be  the  vector  consisting  of  the  n(n  —  l)/2  entries  of  the  upper 
triangle  of  Bk ,  ordered  consistently  with  the  ordering  chosen  for  the  kvec 
operator.  Define 

L  =  [diag(-Bi)  •••  diag(Rm)]T, 

R  =  [ofFdiag(Si)  offdiag(Rm)  ]T  . 

Let 

D  =  Diag(A;  -  Aj),  E  =  Diag(u>j  -  Uj), 

diagonal  matrices  of  size  n(n  —  1) /2  (corresponding  to  1  <  i  <  j  <  n),  whose 
orderings  are  also  consistent  with  that  of  the  kvec  operator.  Then,  writing 
the  diagonal  and  off-diagonal  parts  of  (51)  separately,  we  get  the  linear  system 

'0  0  Lt  /I  r AA 

0  E  Rt  0  s 

L  HD  0  0  Ay 

0  0  A.  .Am 

We  denote  the  matrix  on  the  left-hand  side  of  (54)  by  J q. 

Let  (X,y,Z)  be  a  solution  of  SDP  satisfying  (33),  (34).  The  matrix  Q 
simultaneously  diagonalizing  X  and  Z  is  unique  (up  to  signs  of  its  columns) 
if  and  only  if 

Ai  >  •  •  •  >  Ar  >  0  and  0  <  u>r+i  <  •  •  •  <  u„.  (55) 

Theorem  4  Let  ( X ,  y ,  Z)  =  (QAQT ,  y,  QQQT)  be  a  solution  of  SDP  satisfy¬ 
ing  the  strict  complementarity  and  primal  and  dual  nondegeneracy  conditions, 
and  also  condition  (55).  Then  the  matrix  J q,  evaluated  at  the  solution,  is 
nonsingular. 

Proof:  First  note  that  the  assumptions  on  the  eigenvalues  imply  that  the 
element  of  the  diagonal  matrix  D  corresponding  to  the  index  pair  (i,j)  is 
zero  if  and  only  if  r  +  1  <  i  <  j  <  n,  while  the  element  of  the  diagonal  matrix 
E  corresponding  to  (?'.  j)  is  zero  if  and  only  if  1  <  i  <  j  <  r.  Let  us  rewrite 
Jq  as 

'0  0  0  Lf  /  O' 

0  0  0  0  I 

0  0  E  Rt  0  0 

Li  L2  RT  0  0  0 

0  0  0  0  Ai  0 

_  0  tt-2  0  0  0  0. 
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where  Ai  y  0  and  ^2  >-  0.  As  in  the  proof  of  Theorem  1,  the  nondegeneracy 
assumptions  permit  us  to  collect  all  r  columns  of  Li  and  m—  r  columns  of  R 
together  in  a  nonsingular  to  X  to  matrix  Bi.  We  collect  the  remaining  n(n  — 
1) / 2  —  m.  +  r  columns  of  R  in  a  matrix  R2,  and  partition  D  =  Diag(IAi,  D2) 
and  E  =  Diag(£d,  E2 )  accordingly.  Observe  that  D\  y  0  since  the  columns 
of  Bi  correspond  to  index  pairs  ( i,j )  with  A ,■  >  A j.  Likewise  —  E2  X  0  since 
all  columns  corresponding  to  index  pairs  (i,j)  with  u>i  —  u)j  =  0  are  contained 
in  Bi. 

Let  D  =  Diag(/,iAi)  and  E  =  Diag(0,£i).  Permuting  the  rows  and 
columns,  J q  becomes 


E 

0 

0 

Bj 

I 

0 

0 

0 

0 

Lf 

0 

I 

0 

0 

E2 

0 

0 

Bi  D 

L2 

R2^2 

0 

0 

0 

0 

0 

0 

0 

Ai 

0 

0 

q2 

0 

0 

0 

0 

where  I  is  an  to  by  r  matrix  containing  r  rows  of  the  r  by  r  identity  matrix 
and  to  —  r  zero  rows.  Interchanging  the  first  and  fourth  rows  and  the  second 
and  last  columns,  this  becomes 

Bi  D  0  R2-D2  0  0  L2 

0  I  0  Lj  0  0 

0  0  E2  Rf  0  0 

E  0  0  B'(  i  0 

0  0  0  0  Ai  0 

.0  0  0  0  0  n2. 

Performing  Gauss  block  elimination  on  this  matrix  we  see  that  its  nonsingu¬ 
larity  is  equivalent  to  the  nonsingularity  of 

Bf  +  ED~iB~1'R2D2E-1'R^ . 

Multiplying  on  the  left  by  we  obtain  the  matrix 

I  +  {B^ED^B^iB^E-1!^). 

This  is  nonsingular  since  it  is  of  the  form  I  +  NiN2  with  N 1,  N2  symmetric 
negative  semidefinite.  □ 

Corollary  2  Consider  an  SDP  whose  solution  satisfies  the  strict  comple¬ 
mentarity  and  primal  and  dual  nondegeneracy  conditions,  and  also  condition 
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(55).  Suppose  that  the  Q  method  uses  a  =  0  and  a  =  fi  =  7  =  1.  Then ,  if 
the  method  is  started  with  A,u>,  y  and  Q  initialized  sufficiently  close  to  their 
values  at  the  solution,  the  iterates  converge  Q-quadratically  to  the  solution. 


The  proof  of  Corollary  2  is  more  technical  than  that  of  Corollary  1,  and 
is  omitted.  It  is  necessary  to  establish  that  quadratic  convergence  is  not 
impeded  by  either  (a)  the  use  of  the  Cayley  transform  to  approximate  the 
matrix  exponential  or  (b)  the  dependence  of  the  definition  of  Gq  on  Q. 

As  with  the  other  methods,  we  see  how  to  efficiently  implement  the  Q 
method  by  performing  block  Gauss  elimination  directly  on  Jq,  without  par¬ 
titioning  the  blocks.  The  first  step  yields 


"  -A-1S2  0  Lt  ■ 

0  D~lE  Rt 

<'* 

1 _ 

— 

"  diag(LT  —  fih~l ) ' 
offdiag  ( H ) 

L  R  0 

L  AyJ 

V 

where  S  =  kvec  s  is  the  symmetric  matrix  defined  by 

Sij  =  (A*  —  A  j)Sij. 


One  more  step  of  block  elimination  then  gives  the  Schur  complement 


Mq  =  [L  R] 


Mr1 

0 


0 

- DE -1 


"j 

rLTi 

rt. 

(56) 


As  in  LP,  the  center  factor  of  the  Schur  complement  is  diagonal,  with  entries 
A,  .  A,  A , 


1  <  i  <  n  and 


Uj  —  LO; 


l  <  i  <  j  <  n. 


Of  course,  the  L  and  R  blocks  are  not  independent  of  the  iteration  count,  as 
they  are  in  LP. 

The  Q  method  does  not  require  computing  eigenvalues.  The  variables 
Q ,  A  and  u  are  all  updated  using  rational  operations.  This  is  in  contrast 
with  the  XZ  +  ZX  method  which  requires  the  computation  of  eigenvalues  in 
two  places:  the  formation  of  the  Schur  complement  matrix  M  (to  solve  the 
Lyapunov  equations)  and  in  the  steplength  computation  (to  find  the  step  to 
the  boundary).  Finally,  note  that  the  Schur  complement  matrix  is  symmetric 
for  the  Q  matrix,  but  not  for  the  XZ  +  ZX  method. 

When  evaluated  on  the  central  path,  the  Schur  complement  matrix  Mg 
for  the  Q  method  is  equal  to  the  Schur  complement  matrix  MT  for  the  XZ 
and  XZ  +  ZX  methods,  assuming  that  (55)  holds.  To  see  this,  let  L^1,  RM, 
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D*1,  E11  denote  the  matrices  L,  R,  D ,  E  evaluated  on  the  central  path.  We 
have  A^(£2M|_1  =  Diag(A^/wf)  =  ^-Diag((Af)2)  and 


-DI*(E11)-1 


Diag( 


LL  U  / 

uj  ~  < 


-l-Diag(AfA'i) 


Thus, 


Mq  =  -Ba'D^(Ba')T  =  M 

fi 


using  (39)  and  (40). 

Although  the  Q  method  has  some  attractive  features,  it  is,  at  present,  not 
a  practical  alternative  to  the  other  algorithms.  When  initialized  far  from  the 
solution,  convergence  is  generally  not  obtained.  However,  the  quadratic  local 
convergence  established  here  is  observed  in  practice. 


7  Computational  Results 

In  this  section  we  report  on  the  results  of  some  extensive  numerical  experi¬ 
ments.  We  start  by  discussing  some  important  implementation  details. 

Mehrotra’s  predictor-corrector  (PC)  rule  is  a  well  known  technique  in  LP 
[Wri96j.  It  can  easily  be  extended  to  the  XZ  and  XZ  +  ZX  methods,  as 
follows.  In  our  implementation,  we  use  the  PC  rule  only  if 

INI  +  ||-Rd||  <  p, 

where  p  is  a  given  threshold  value  (10-4  in  the  runs  reported  here).  This  pre¬ 
vents  the  PC  rule  from  being  applied  until  an  approximately  feasible  point  has 
been  reached.  When  the  condition  does  not  hold,  we  use  the  Basic  Iteration 
instead,  with  o  =  0.25. 

XZ  and  XZ+ZX  Methods  with  Mehrotra  Predictor- Corrector  Rule 

1.  Determine  AX ,  Ay,  A Z  from  (20),  using  p  =  0  in  (18),  and  symmetrize 
AX  in  the  case  of  the  XZ  method. 

2.  Choose  steplengths  a 

o  = 

X.X 


0  using  (28)  (29) ,  and  define 
(X  +  o'  AX)  •  (Z  +  0AZ)  \  3 


x*x 


(57) 


fi  =  G 


n 
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3.  Redetermine  AX,  Ay,  A X  from  (20),  using 

I  yl  -  (XZ  +  AX  A Z)  XZ  method  1 

c  III  -  |  (XZ  +  XX  +  AX  AX  +  AX  AX)  XZ  +  XX  method  J  5 

symmetrize  AX  in  the  case  of  the  XZ  method,  and  update  the  iterates 

by 

X  <-  X  +  oAX 
V  y  +  /3  Ay 

X  <-  X  +  p  AZ, 

with  a,  (i  given  by  (28)-(29). 

See  [TTT96]  for  a  PC  version  of  the  Nesterov-Todd  method.  (Our  exper¬ 
iments  use  (57)  in  the  implementation  of  all  the  methods,  although  [TTT96] 
use  the  exponent  2  instead  of  3  in  (57).) 

Computational  results  are  presented  in  Tables  1  through  4.  Tables  1,  2 
and  3  report  results  for  randomly  generated  problems,  for  m.  =  n,  with  the 
Ak  generated  first,  and  b  and  C  chosen  to  ensure  existence  of  strictly  feasible 
primal  and  dual  points.  Such  problems  are  expected  to  be  nondegenerate.  All 
methods  were  initialized  with  the  infeasible  starting  point  (X,  y,X)  =  (7,0,/). 
Table  1  shows  results  for  the  XX  +  XX,  XX  and  NT  Basic  Iteration,  using 
c r  =  0.25  in  (27),  with  various  choices  for  the  steplength  parameter  r  in  (28), 
(29).  We  also  implemented  the  X-1  method  but  found  it  required  many  more 
iterations  than  the  others  with  the  same  parameter  choices.  Table  2  shows 
results  for  the  PC  variants,  with  threshold  p  =  10-4.  Table  3  compares  the 
PC  algorithms  (with  r  =  0.99)  for  different  values  of  n.  In  all  cases,  part  (a) 
of  the  table  shows  the  number  of  iterations  required  to  reduce  the  duality  gap 
by  a  factor  of  1012,  averaged  over  100  problems  (20  for  Table  3).  Part  (b) 
shows  the  final  value  of 


logio(IMI +  11^11). 

averaged  over  the  same  data.  A  run  was  considered  to  be  a  failure  (indicated 
by  F)  if  the  duality  gap  and  infeasibility  norm  were  not  reduced  to  at  least 
10-4;  these  runs  are  not  included  in  the  average  statistics.  The  >  symbol 
in  some  iteration  counts  indicates  that,  in  at  least  one  run,  the  duality  gap 
was  not  reduced  by  the  desired  factor  of  1012  before  the  maximum  number 
of  iterations  was  exceeded.  All  experiments  were  conducted  in  Matlab,  using 
IEEE  double  precision  arithmetic. 

Let  us  first  consider  the  results  shown  in  Table  1  for  the  Basic  Iteration 
without  the  PC  rule.  For  r  =  0.9,  all  three  methods  show  essentially  the  same 
number  of  iterations.  The  XX  +  XX  method  achieves  the  highest  accuracy 
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(in  terms  of  feasibility) .  More  aggressive  choices  of  the  steplength  parameter 
have  little  effect  on  the  XZ  +  ZX  method  but  cause  difficulties  for  the  XZ 
and  NT  methods.  Choosing  r  =  0.999  causes  the  XZ  and  NT  methods  to 
fail  in  many  cases. 

Table  2  shows  the  same  experiment  using  the  PC  rule.  With  r  =  0.9,  the 
PC  rule  greatly  reduces  the  number  of  iterations,  though  with  some  loss  of 
feasibility  for  the  XZ  and  NT  methods.  More  aggressive  choices  of  r  give  a 
significantly  reduced  number  of  iterations  (without  loss  of  feasibility)  for  the 
XZ  +  ZX  method,  but  lead  to  many  failures  for  the  XZ  and  NT  algorithms. 

Turning  to  Table  3,  we  see  an  iteration  count  which  is  essentially  constant 
as  n  increases,  though  with  some  loss  of  feasibility  for  larger  n.  Primal 
feasibility  can  be  regained  by  projecting  onto  the  set  { x  :  Aa1  =  b},  but  this 
generally  fails  to  give  a  more  accurate  solution,  as  the  duality  gap  usually 
increases. 

Table  4  shows  results  for  the  Lovasz  9  function  [GLS88]  for  randomly 
generated  graphs  with  varying  edge  density.  Here  n  is  the  number  of  vertices 
in  the  graph  and  m  —  1  is  the  number  of  edges. 

We  also  implemented  the  Q  method  and  observed  that  it  has  essentially 
the  same  rapid  local  convergence  and  high  accuracy  properties  as  the  XZ  + 
ZX  method,  although  when  initialized  far  from  the  solution,  it  generally  fails 
to  converge. 

The  results  show  clearly  that  the  XZ+ZX  PC  method  is  the  most  efficient 
in  terms  of  number  of  iterations,  and  is  the  most  robust  with  respect  to  its 
ability  to  step  close  to  the  boundary.  We  make  no  claim  regarding  the  best 
overall  method  in  practice,  when  costs  per  iteration  are  taken  into  account. 

Acknowledgments.  The  third  author  is  grateful  to  Richard  Tapia,  Bob 
Vanderbei  and  Steve  Wright  for  introducing  him  to  primal-dual  interior-point 
methods  for  LP.  The  authors  also  thank  Mike  Todd  for  many  helpful  com¬ 
ments,  especially  a  suggestion  concerning  the  Q  method  that  makes  the  de¬ 
velopment  given  here  substantially  simpler  than  the  one  given  in  [AH094a], 
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Method 

r  =  0.9 

t  =  0.99 

r  =  0.999 

,YZ  +  XX 

21.5 

21.1 

21.1 

xz 

21.7 

22.1 

>  23.3  (F:  11%) 

NT 

21.6 

>  21.9 

>  24.9  (F:  25%) 

Number  of  iterations  to  reduce  gap  by  1012 
Averaged  over  100  randomly  generated  problems 
Basic  iteration  with  o  =  0.25 
Starting  infeasible,  n  =  20,  to  =  20 
F:  Failed  to  reduce  gap  and  feasibility  norm  to  10-4 
Table  la 


Method 

r  =  0.9 

t  =  0.99 

r  =  0.999 

XZ  +  xx 

-12.6 

-12.6 

-12.6 

xz 

-11.1 

-11.0 

-11.0  (F:  11%) 

NT 

-10.9 

-10.8 

-10.9  (F:  25%) 

Log  norm  infeasibility 
Averaged  over  same  data 


Table  lb 


Method 

t  =  0.9 

r  =  0.99 

t  =  0.999 

xx  +  xx 

15.0 

10.2 

9.3 

xz 

16.3 

14.7 

>  16.9  (F:  44%) 

NT 

15.6 

>  22.8 

>  30.0  (F:  98%) 

Number  of  iterations  to  reduce  gap  by  1012 
Averaged  over  100  randomly  generated  problems 
Mehrotra  predictor-corrector  rule 
Starting  infeasible,  n  =  20,  to  =  20 
F:  Failed  to  reduce  gap  and  feasibility  norm  to  10-4 
Table  2  a 


Method 

r  =  0.9 

r  =  0.99 

r  =  0.999 

JZ  +  ZX 

-12.1 

-12.1 

-12.2 

XX 

1 

00 

bo 

1 

00 

bo 

-9.5  (F:  44%) 

NT 

-9.1 

-8.5 

-11.3  (F:  98%) 

Log  norm  infeasibility 
Averaged  over  same  data 


Table  2  b 
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Method 

s 

II 

II 

to 

o 

s 

II 

II 

o- 

o 

s 

II 

II 

00 

o 

IZ  +  ZX 

10.2 

10.5 

10.4 

XZ 

14.7 

14.8 

14.6 

NT 

>  22.8 

>  23.1 

>  22.0 

Numl 

ber  of  iterations  to  reduce  gap  by  1012 

Averaged  over  20  randomly  generated  problems 
Mehrotra  predictor-corrector  rule  with  r  =  0.99 
Starting  infeasible 
Table  3a 


Method 

s 

II 

II 

to 

o 

s 

II 

II 

o 

s 

II 

II 

00 

o 

XZ  +  zx 

-12.1 

-11.3 

-10.5 

XZ 

1 

00 

bo 

-8.4 

-7.8 

NT 

-8.5 

-7.9 

-7.5 

Log  norm  infeasibility 
Averaged  over  same  data 


Table  3b 


Method 

Density=  0.25 

Density=  0.5 

Density=  0.75 

XZ  +  ZX 
XZ 

NT 

9.1 

>  13.6  (F:  7%) 

>  12.3 

9.2 

>  17.4  (F:  9%) 
13.6 

8.7 

12.4  (F:  1%) 

13.1 

Lovasz  9  function 

Number  of  iterations  to  reduce  gap  by  1012 
Averaged  over  100  randomly  generated  problems 
Mehrotra  predictor-corrector  rule  with  t  =  0.99 
Starting  infeasible,  n  =  10 

F:  Failed  to  reduce  gap  and  feasibility  norm  to  10-4 
Table  4a 


Method 

Density=  0.25 

Density=  0.5 

Density  =  0.75 

XZ+ZJ 

XZ 

NT 

-14.5 

-12.2  (F:  7%) 
-12.0 

-14.5 

-12.4  (F:  9%) 
-12.1 

-14.6 

-13.2  (F:  1%) 
-12.5 

Log  norm  infeasibility 
Averaged  over  same  data 


Table  4b 


Semidefinite  Programming 


27 


Appendix.  Symmetric  Kronecker  Products 

Consider  the  linear  operator  on  K”x”  defined  by  the  map 

K  NKMt  (58) 

where  M,  N  6  KnX”.  It  is  standard  to  represent  this  linear  operator  by  the 
Kronecker  product 


M  ®  N 


MniV 


MnlN 


MnlN  - 
MnnN  _ 


2 

where  nvec  maps  K”x”  to  K”  ,  stacking  the  columns  of  a  matrix  in  a  vector, 
since  then 

(M  0  N )  nvec  (A')  =  nvec  ( NKMT ).  (59) 

Other  Kronecker  product  identities  include 

(M  C  N)~l  =  M~l  0  N-1  and  (M  0  N)  (K  ®  L)  =  MIC  ®  NL.  (60) 
Now  consider  the  linear  operator  on  Sn  defined  by  the  map 

K  i( NKMt  +  MKNt)  (61) 

where  M,  N  €  M”x”.  To  represent  this  map  as  a  matrix,  define  M  ®  N  by 
the  identity 

(M  ®  N)  svec  ( K )  =  svec  (  — ( NKMT  +  MKNT ) )  (62) 

2" 

where  svec  maps  Sn  to  by 

T 

svec  ( K )  =  [-An,  V2  A'i 2 , . .  ■ ,  v  2A'im,  A'22, . . . ,  \Z2K2n, . ,  Knn 

(63) 

Note  that 

K  •  L  =  svec  ( K)T  svec  (A). 

Of  course,  the  ordering  used  in  (63)  is  arbitrary:  the  important  point  is  that 
each  element  of  svec  (K)  is  associated  with  an  index  pair  with  i  <  j. 

The  ordering  chosen  for  svec  dictates  a  corresponding  ordering  for  ®. 

We  call  the  matrix  M  ®  N  a  symmetric  Kronecker  product.  Note  the 
identity 

M  ®  N  =  if  ®  M.  (64) 
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Furthermore  (M  ®  M)_1  =  M~l  ®  M_1,  but  (M  ® IV)-1  7^  M_1  ®  iV_1,  in 
general. 

We  need  the  following  lemmas  whose  proof  are  straightforward. 

Lemma  1  Let  V  €  R”Xn  and,  let  Vi,  1  <  i  <  n,  denote  the  columns  of  V. 
The  ( i,j )  column  ofV®V ,  1  <  i  <  j  <  n;  is  the  vector 

{svec  ( Vivf )  if  i  =  j 

^  svec  (vivj  +  v:jvf )  if  i  <  j 

Lemma  2  Let  M ,  N  be  commuting  symmetric  matrices,  and  let  o] , . , . .  n „ . 

denote  their  eigenvalues  with  vi,...,vn  a  common  basis  of  or¬ 
thonormal  eigenvectors.  The  n(n  +  l)/2  eigenvalues  of  M  ®  N  are  given 
by 

2  +  AQ'i)  I  1  <i<  j  <n, 

with  the  corresponding  set  of  orthonormal  eigenvectors 

{svec  ( V{vf )  if  i  =  j 

^7f  svec  +  vjvf)  if  i  <  j 

In  other  words,  if  V  =  [%  •  •  then  V  ®  L  is  an  orthogonal  matrix  of 
size  n2  X  n2  which  diagonalizes  M  ®  N.  The  standard  algorithm  for  solving 
the  Lyapunov  equation  MXNT  +  NXMT  =  B  (when  M  and  N  commute) 
immediately  follows:  the  solution  is  VCVT ,  where  C  is  found  by  computing 
V tBV  and  dividing  its  entries  by  the  quantities  ( a +/!,&_?)  componentwise. 
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